03. Sentiment Analysis
Sentiment Analysis is probably one of the most popular uses of Natural Language Processing these days. It sounds like a toy problem, but in reality, it has become an important tool for many purposes, e.g.:
- Understanding customer sentiment around a company for making investment decisions.
- Getting a feedback signal for social media and advertising campaigns.
- As a quantitative measure for book and movie reviews, and so on.
Sentiment Analysis as a Supervised Learning Problem
From a machine learning perspective, you can think of this either as a classification or regression problem, depending on whether you want to identify specific emotional categories or labels, such as positive vs. negative, or a real number that captures a more fine-grained sentiment value.
In either case, you start with a given corpus, say of movie reviews, process each review as an individual document, and extract a set of features that represent it. This representation can be a direct document representation, such as Bag-of-Words or TF-IDF, or a sequence of word vectors combined together.
It depends on what model you choose, e.g. if you want to use an SVM to predict sentiment labels, you can use Bag-of-Words, but if you want to apply an RNN, you’ll need word vectors.
Then pick an appropriate loss function, such as categorical cross entropy for classification, or mean squared error for regression, to train your model.
Challenges: Sense Ambiguity
Consider these two brief movie reviews:
- “I expected this movie to be much better.”
- “This movie was much better than I expected.”
A typical bag-of-words representation will perceive these two to be almost the same.
But if you allow the words to be interpreted in a sequence, then ordering differences can help a model learn the distinctions between different sentiments.
RNN-based models have proved to be very successful at this task, as they treat text like a sequence, and incorporate information over time.
Lab: Sentiment Analysis
Time for you to perform sentiment analysis on a popular movie reviews dataset!
Clone the AIND-NLP repository if you haven't already (or pull the latest commit):
https://github.com/udacity/AIND-NLP
Then launch the Jupyter notebook sentiment_analysis.ipynb
and follow the instructions in the notebook to complete this lab.